The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Multimodal image-text models have shown remarkable performance in the past few years. However, evaluating their robustness against distribution shifts is crucial before adopting them in real-world applications. In this paper, we investigate the robustness of 9 popular open-sourced image-text models under common perturbations on five tasks (image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation). In particular, we propose several new multimodal robustness benchmarks by applying 17 image perturbation and 16 text perturbation techniques on top of existing datasets. We observe that multimodal models are not robust to image and text perturbations, especially to image perturbations. Among the tested perturbation methods, character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data. We also introduce two new robustness metrics (MMI and MOR) for proper evaluations of multimodal models. We hope our extensive study sheds light on new directions for the development of robust multimodal models.
translated by 谷歌翻译
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields. However, most previous works assumed that each view is complete and aligned. This leads to an inevitable deterioration in their performance when encountering practical problems such as missing or unaligned views. To address the challenge of representation learning on partially aligned multi-view data, we propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations. Compared with current approaches, the proposed method has the following merits: (1) our model is an end-to-end framework that simultaneously performs view-specific representation learning via view-specific autoencoders and cluster-level data aligning by combining multi-view information with the cross-view graph contrastive learning; (2) it is easy to apply our model to explore information from three or more modalities/sources as the cross-view graph contrastive learning is devised. Extensive experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
translated by 谷歌翻译
近年来,图形神经网络(GNNS)在不同的现实应用中表现出卓越的性能。为了提高模型容量,除了设计聚合运作,GNN拓扑设计也非常重要。一般来说,有两个主流GNN拓扑设计方式。第一个是堆叠聚合操作以获得更高级别的功能,但随着网络更深的方式,易于进行性能下降。其次,在每个层中使用多聚合操作,该层在本地邻居提供足够和独立的特征提取阶段,同时获得更高级别的信息昂贵。为了享受减轻这两个方式的相应缺陷的同时享受福利,我们学会在一个新颖的特征融合透视中设计GNN的拓扑,这些融合透视中被称为F $ ^ 2 $ GNN。具体而言,我们在设计GNN拓扑中提供了一个特征融合视角,提出了一种新颖的框架,以统一现有的拓扑设计,具有特征选择和融合策略。然后,我们在统一框架之上开发一个神经结构搜索方法,该方法包含在搜索空间中的一组选择和融合操作以及改进的可微分搜索算法。八个现实数据集的性能增益展示了F $ ^ 2 $ GNN的有效性。我们进一步开展实验,以证明F $ ^ 2 $ GNN可以通过自适应使用不同程度的特征来缓解现有GNN拓扑设计方式的缺陷,同时提高模型容量,同时减轻了现有的GNN拓扑设计方式的缺陷,特别是缓解过平滑问题。
translated by 谷歌翻译
在联合学习(FL)问题中,客户采样在训练算法的收敛速度中起着关键作用。然而,虽然是FL中的一个重要问题,但客户采样缺乏研究。在本文中,我们提出了在线学习,使用强盗反馈框架来了解FL中的客户采样问题。通过调整在线随机镜血清序列算法,以最小化梯度估计的方差,我们提出了一种新的自适应客户端采样算法。此外,我们使用在线集合方法和加倍技巧来自动选择算法中的调整参数。从理论上讲,我们将动态遗憾与比较器相结合,作为理论上最佳采样序列;我们还包括在我们的上限中的该序列的总变化,这是对问题的内在难度的自然度量。据我们所知,这些理论贡献对现有文献进行了新颖。此外,通过实施合成和真实数据实验,我们展示了我们所提出的算法在广泛使用的统一采样中的优势以及以前研究的其他在线学习的采样策略的实证证据。我们还检查其对调谐参数的选择的鲁棒性。最后,我们讨论其可能的延伸,而无需更换和个性化的流动。虽然原始目标是解决客户的采样问题,但这项工作在随机梯度下降和随机坐标序列方法上具有更大的应用。
translated by 谷歌翻译
近年来,图形神经网络(GNNS)在现实世界数据集上对不同应用的不同应用表现出卓越的性能。为了提高模型能力并减轻过平滑问题,提出了几种方法通过层面连接来掺入中间层。但是,由于具有高度多样化的图形类型,现有方法的性能因不同的图形而异,导致需要数据特定的层面连接方法。为了解决这个问题,我们提出了一种基于神经结构搜索(NAS)的新颖框架LLC(学习层面连接),以学习GNN中中间层之间的自适应连接。 LLC包含一个新颖的搜索空间,由3种类型的块和学习连接以及一个可分辨率搜索过程组成,以实现有效的搜索过程。对五个现实数据集进行了广泛的实验,结果表明,搜索的层面连接不仅可以提高性能,而且还可以缓解过平滑的问题。
translated by 谷歌翻译
行动预测旨在通过部分观察视频推断即将举行的人类行动,这是由于早期观察结果有限的信息有限。现有方法主要采用重建策略来处理此任务,期望从部分观察到完整视频来学习单个映射函数,以便于预测过程。在这项研究中,我们提出了来自两个新方面的部分视频查询生成“完整视频”功能调节的对抗性记忆网络(AMEMNet)。首先,键值结构化存储器发生器旨在将不同的部分视频存储为键存储器,并在具有门控机制和查询关注的值存储器中动态地写入完整视频。其次,我们开发了一个类感知判别者,以指导内存发生器在对抗训练时不仅提供现实,而且还提供鉴别的完整视频特征。通过RGB和光学流量的晚期融合给出了AMEMNET的最终预测结果。提供两个基准视频数据集,UCF-101和HMDB51的广泛实验结果,以证明所提出的AMEMNET模型在最先进的方法的有效性。
translated by 谷歌翻译
在本文中,我们考虑了在不完整视图上的多视图聚类问题。与完整的多视图聚类相比,视图缺失的问题会增加学习不同视图的常见表示的难度。为了解决挑战,我们提出了一种新颖的不完整的多视图聚类框架,该框架包含跨视网围传输和多视图融合学习。具体地,基于在多视图数据中存在的一致性,我们设计了一种基于跨视网围的转移转移的完成模块,该完成模块将已知与缺失视图的已知相似的相互关系的关系传输,并根据传输的图形网络恢复丢失的数据关系图。然后,设计特定于特定的编码器以提取恢复的多视图数据,引入基于注意的融合层以获得公共表示。此外,为了减少由视图之间不一致并获得更好的聚类结构引起的误差的影响,引入了联合聚类层以同时优化恢复和聚类。在几个真实数据集上进行的广泛实验证明了该方法的有效性。
translated by 谷歌翻译
超图允许使用多向高阶关系建模问题。然而,大多数现有超图的算法的计算成本可能严重取决于输入的超图尺寸。为了解决不断增加的计算挑战,可以通过积极聚合其顶点(节点)来预先处理给定的超图来促进图表粗化。然而,未经纳入启发式图粗化技术的最先进的超图分区(聚类)方法未得到优化,以保留超图的结构(全局)属性。在这项工作中,我们提出了一种有效的光谱超图粗化方案(HypersF),以保持超图的原始光谱(结构)特性。我们的方法利用了最近的强烈局部最大流量的聚类算法,用于检测最小化比例的超图形顶点集。为了进一步提高算法效率,我们通过利用与原始超图对应的二分形图的光谱聚类来提出分频和征服方案。我们从现实世界VLSI设计基准提取的各种超图的实验结果表明,与现有最先进的现有技术相比,所提出的超图粗略化算法可以显着提高超图和运行时效率的多线电导算法。
translated by 谷歌翻译
Temperature field reconstruction of heat source systems (TFR-HSS) with limited monitoring sensors occurred in thermal management plays an important role in real time health detection system of electronic equipment in engineering. However, prior methods with common interpolations usually cannot provide accurate reconstruction performance as required. In addition, there exists no public dataset for widely research of reconstruction methods to further boost the reconstruction performance and engineering applications. To overcome this problem, this work develops a machine learning modelling benchmark for TFR-HSS task. First, the TFR-HSS task is mathematically modelled from real-world engineering problem and four types of numerically modellings have been constructed to transform the problem into discrete mapping forms. Then, this work proposes a set of machine learning modelling methods, including the general machine learning methods and the deep learning methods, to advance the state-of-the-art methods over temperature field reconstruction. More importantly, this work develops a novel benchmark dataset, namely Temperature Field Reconstruction Dataset (TFRD), to evaluate these machine learning modelling methods for the TFR-HSS task. Finally, a performance analysis of typical methods is given on TFRD, which can be served as the baseline results on this benchmark.
translated by 谷歌翻译